Skip to content

Conversation

@newokaerinasai
Copy link
Contributor

Have you read the Contributing Guidelines?

Sure

Describe your changes

This PR adds support for the following new features:

  1. Evaluation API
  2. CSV files (for Evaluation purposes)

@newokaerinasai newokaerinasai requested review from VProv and mryab July 10, 2025 11:55
)
elif type == "compare":
# Check if either model-a or model-b config/name is provided
model_a_provided = model_a_field or any(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_a_provided = model_a_field or any(
model_a_provided = model_a_field is not None or any(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, should this be any or all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's "all" then in case of incomplete set of parameters it will fail with an error "model_an and model_b are required for compare evaluation" which is not a correct error explanation. In this case the only check is "at least something is present", more granularity is added below

)
parameters: Union[ClassifyParameters, ScoreParameters, CompareParameters]
# Build parameters based on type
if type == "classify":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check that the parameters which are not applicable for the evaluation type (e.g., model-a and model-b parameters for classify and score) are not passed? For instance, we could raise a ValueError if the user passed parameters that are not applicable for the evaluation type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment on lines 395 to 418
if model_b_field is not None:
# Simple mode: model_b_field is provided
if any(model_b_config_params):
raise click.BadParameter(
"Cannot specify both --model-b-field and config parameters (--model-b-name, etc.). "
"Use either --model-b-field alone if your input file has pre-generated responses,\
or config parameters if you generate it on our end"
)
model_b_final = model_b_field
elif any(model_b_config_params):
# Config mode: config parameters are provided
if not all(model_b_config_params):
raise click.BadParameter(
"All model config parameters are required when using detailed configuration: "
"--model-b-name, --model-b-max-tokens, --model-b-temperature, "
"--model-b-system-template, --model-b-input-template"
)
model_b_final = {
"model_name": model_b_name,
"max_tokens": model_b_max_tokens,
"temperature": model_b_temperature,
"system_template": model_b_system_template,
"input_template": model_b_input_template,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move these checks inside client.evaluation.create? Looks like right now we do not have strong validation for inputs if people use the Python SDK directly instead of the CLI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. That's what I've done

@newokaerinasai newokaerinasai marked this pull request as ready for review July 22, 2025 12:52
@newokaerinasai newokaerinasai merged commit 2e34944 into main Jul 22, 2025
7 of 10 checks passed
@newokaerinasai newokaerinasai deleted the add_evals branch July 22, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants